Enhancing topology adaptation in information-sharing social networks 
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The advent of Internet and World Wide Web has led to unprecedent growth of the information 
available. People usually face the information overload by following a limited number of sources 
which best fit their interests. It has thus become important to address issues like who gets followed 
and how to allow people to discover new and better information sources. In this paper we conduct an 
empirical analysis on different on-line social networking sites, and draw inspiration from its results 
to present different source selection strategies in an adaptive model for social recommendation. We 
show that local search rules which enhance the typical topological features of real social communities 
give rise to network configurations that are globally optimal. These rules create networks which are 
effective in information diffusion and resemble structures resulting from real social systems. 

PACS numbers: 89.75.-k, 89.65.Ef 



I. INTRODUCTION 

The fast development of the Internet has caused the 
amount of information available to grow dramatically. 
Therefore, people can hardly find what they are inter- 
ested in. The problem of delivering the right content 
to the right person has attracted much attention in re- 
cent years. A possible solution is represented by Recom- 
mender Systems [IH3], which act as personalized infor- 
mation filters by analyzing users' profiles and past ac- 
tivities. Techniques used to produce recommendations 
include Collaborative Filtering [2l |4], Bayesian cluster- 
ing [5], Probabilistic Latent Semantic Analysis [6], ma- 
trix decomposition [3 [5], diffusion and conduction P- 
111] and many others. However it was recently shown 
that similarity of users' past activities plays a less im- 
portant role than social influence: people value recom- 
mendations obtained by abstract mathematical analysis 
less than those coming from their friends or peers |12j . 
Social recommendation has hence emerged as a new ap- 
proach which makes direct use of the social connections 
between members of a society [T3^. Examples of social 
recommending implementations include services like De- 
licious. com, Flickr.com, LiveJournal.com, Youtube.com, 
FriendFeed.com and Twitter.com, where users can select 
some other users as information sources and follow them 
by importing or receiving respectively their URLs, pho- 
tos, journals, videos, feeds and microblogs. In these sys- 
tems the information spread from a user to her followers, 
and eventually to the followers' followers, and so forth. 
This diffusion mechanism resembles the spreading of epi- 
demics or rumors over a network [14H16J . 

A recently proposed news recommendation model [171- 
[20] mimics the spreading process typical for social sys- 
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tems and combines it with an adaptive network of con- 
nections. In this model, when a user reads a news (or 
a different kind of content), she can either "approve" or 
"disapprove" it. If approved, the news spreads to the 
user's followers. Thus each user receives pieces of news 
from other users who represent her current leaders (i.e. 
information sources) . Simultaneously with the spreading 
of news the leader-follower network evolves with time in 
order to connect users with similar tastes. A key as- 
pect of this model is hence how to find good sources for 
each user. In [T5] the authors propose a hybrid strategy 
for leaders updating based on local search and random 
off-trap, that is able to efhciently optimize the network 
of connections. The local aspect of the proposed strat- 
egy considers the leaders of her current leaders as po- 
tential candidates for each user, increasing in this way 
the clustering coefficient of the network. However this 
approach leaves aside other potential good candidates. 
For instance, real life examples reveal that a follower of 
a user is very likely to become a good leader for her too, 
as suggested by the high value of the link reciprocity in 
many information-sharing social networking services. 

In this paper we first conduct an empirical analysis on 
different on-line social networks, showing that real so- 
cial communities are characterized by high values of link 
reciprocity and clustering coefficient. Then, building on 
the adaptive model introduced in [T7], we pose the fol- 
lowing question: if the users' choice of the leaders were 
guided by an automated recommendation method, which 
methods would lead to good choices? Inspired by our 
empirical results, we propose and compare different local 
leader updating strategies. Using an agent-based frame- 
work, we study the features of the resulting artificial net- 
work topology from the viewpoint of user' satisfaction, 
network adaptation and recommendation efficiency. We 
only rely on local search rules because centralized-search 
mechanisms are very demanding and almost unfeasible 
for large-scale networks. However we show that this ap- 
parent drawback can be overcome by an apt choice of 



TABLE I. Statistics of social networking sites. 





Delicious 


Flickr 


LivcJournal 


YouTubc 


Friend-Feed 


Date of crawl 


05-2008 


01-2007 


12-2006 


01-2007 


09-2009 


Number of users 


854,357 


1,715,255 


5,203,764 


1,138,499 


513,588 


Number of links 


2,521,187 


22,613,980 


76,937,805 


4,945,382 


19,810,789 


Mean user degree 


2.95 


13.18 


14.78 


4.34 


38.58 


Reciprocity 


0.392 


0.624 


0.734 


0.791 


0.207 


Clustering 


0.161 


0.165 


0.255 


0.077 


0.146 



these rules: local awareness of the network becomes as ef- 
fective as global knowledge in producing optimal topolo- 
gies. Moreover we find that an effective local updating 
strategy actually enhances both reciprocity and cluster- 
ing coefficient of the network, mimicking in this way the 
users' search of sources (or in general acquaintances) in 
social networks. 



II. EMPIRICAL ANALYSIS 

In this section we extract the features of five differ- 
ent on-line information-sharing social networking sites: 
delicious. com, flickr.com, livejournal.com,, youtube.com 
and friendfeed.com. In these systems users form a so- 
cial network and can share different kind of content — 
respectively bookmarks, photos, blog articles, videos and 
feeds. Table|l]gives an overview of the features of the dif- 
ferent systems. Note that we excluded from the analysis 
both isolated nodes and self-loops. To describe the net- 
works' topologies, we use the formalism of the adjacency 
matrix (whose entry a^- equals one if there is a link from 
i to j, and zero otherwise) and measure two standard 
quantities: 

• Link reciprocity (r) is the tendency of node pairs to 
form connections between each other and is defined 
as the ratio of the number of bi-directed links to the 
total number of links in the network 1211 : 
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Clustering coefficient (c) measures the tendency of 
the network to form tightly connected components 
and is defined as the ratio of the number of directed 
link triangles that exist among a user and her first 
neighbors to the total number of triangles that can 
exist among these users, averaged over all users [22] : 
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where di = J2j{(^ij + o-ji) and d^ = J2jO-ijaji 
are respectively the total degree and number of bi- 
directed links of node i. 



These quantities are the simplest and most widely used 
measures to describe the local link structures of a network 
and the relationships among close nodes (in terms of dis- 
tance on the network). There are other important mea- 
sures of network's topology, like node's degree or central- 
ity, which do not focus on the interconnections between 
neighboring nodes and were consequently excluded from 
our analysis. 

Delicious.com, previously known as del.icio.us, is the 
world-largest online bookmarking website. Users in de- 
licious. com collect URLs as bookmarks; moreover, they 
can select other users to be their leaders (i.e. information 
sources) and follow them by importing their bookmarks. 
Hence we can naturally represent the delicious.com com- 
munity by a directed leader-follower network. To extract 
the network's structure, we perform a crawl of the user 
graph by accessing the public web interface provided by 
the site: starting from a user, we follow her outgoing and 
incoming links to reach other users, and so on. This algo- 
rithm is known as breadth-first search (BFS) [33]. The 
dataset is being collected since May 2008, and it con- 
sists of 854,357 users and 2,521,187 directed links among 
them; out of these users, more than 99% belong to the 
giant component. 

Flickr. com, Livejournal. com and Youtube. com are web 
services in which users can select other users as friends 
(leaders, as intended in this paper) to get access to their 
content (photos, blogs and video respectively). The 
leader-follower networks of these on-line communities 
were obtained in |24j by crawling the large weakly con- 
nected component of the corresponding user graphs. The 
algorithm used for the crawl was again BFS with snow- 
ball method ,25j: the data extraction starts from a set 
of seed users and then it expands by following the out- 
going links of these users to reach new users, and so 
on. Friendfeed.com is a microblogging service in which 
users can share short messages to a list of contacts, who 
can comment back under the original messages. It is 
also a feed aggregator, importing data from several other 
services like Twitter, Facebook, YouTube, Flickr and 
Google Reader. The leader-follower network we analyze 
was crawled in [2S1 . 

The summary of the results of our analysis is reported 
in Table |lj We immediately notice that both the level 
of link reciprocity and the degree of clustering in all so- 
cial networks are significantly high — between four and 
five orders of magnitude larger than the respective val- 



ues in Erdos and Renyi random graphs [37] with the same 
number of nodes and links as the real networks. This 
phenomenon has a natural explanation in information- 
sharing social communities: if two users have common 
interests each of them can likely provide the other with 
the right content; also, people tend to be introduced to 
other people via mutual friends, increasing the probabil- 
ity that two friends of a single user are also friends. In 
the following sections we will draw inspiration from these 
observations to define the topology evolution rules of an 
adaptive model for social recommendation. 



III. MODEL DESCRIPTION 

We now briefly summarize the adaptive news recom- 
mendation model based on [T71 UH] that will be used for 
the study of different leader selection strategies. 

The system consists of a network of U users, each con- 
nected to L other users (the user's leaders) by directed 
links. The value of L is fixed as users usually follow a 
limited number of sources. Users receive news from their 
leaders and eventually read and forward them to their 
followers. In addition, they can introduce new content to 
the system. 

Evaluation of news a by user i {cia) is cither +1 (liked), 
— 1 (disliked) or (not read yet). Similarity of reading 
tastes of users i and j (sy) is estimated by comparing 
users' past assessments: if i and j evaluated Nij news 
in common and agreed Aij times, their similarity can be 
measured in terms of the overall probability of agreement 
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where the term in the parentheses disadvantages user 
pairs with small overlap Nij (which are more sensitive 
to statistical fluctuations). For Nij < 1, Sij is replaced 
by a small positive value Sg, reflecting the fact that even 
when there are no users' evaluations, there is some base 
similarity of their interests. Apart from their ratings, no 
other information about users is assumed by the model. 
Propagation of news works as follows. When news a 
is introduced to the system by user i at time ta, it is 
passed from i to her followers j with a recommendation 
score proportional to their similarity Sij . If this news is 
later liked by one of users j who received it, it is similarly 
passed further to this user's followers k (with recommen- 
dation score proportional to Sj^), and so on. For a generic 
user k at time t, news a is recommended according to its 
current recommendation score: 
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Here Lk is the set of leaders of user k and i5 is the Kro- 
necker symbol: Se^^fi = 1 only when user k has not read 
news a yet and ^e,Q,i = 1 only if user / liked news a. The 
sum represents the instance of a user receiving the same 



news from multiple leaders — recommendation scores are 
summed up in this case, reflecting that a news liked by 
several leaders is more likely to be liked by this user too. 
Finally, to allow fresh news to be accessed fast, recom- 
mendation scores are exponentially damped with time, 
with T e (1, oo) being the scale of the damping. 

Simultaneously with the propagation of news, con- 
nections of the leader-follower network are periodically 
rewired to drive the system to an optimal state where 
users with high similarity (taste mates) are directly con- 
nected. When rewiring occurs for user i, the leader with 
the lowest similarity value (j) is replaced with a new user 
(k) if Sik > Sij. There are different selection strategies 
for picking new candidate leaders: 

1. Random rewiring, k is simply a user picked at ran- 
dom in the network. 

2. Local rewiring, k is the user in the neighborhood 
of user i with the maximum value of Sik. This 
mechanism is based on the observation that two 
users who have common acquaintances are likely 
to have similar interests. As will be discussed in 
the next section, there are different ways to define 
such neighborhood. 

3. Hybrid rewiring. Random rewiring is used in some 
cases and local rewiring in the others. This mecha- 
nism mimics both users searching for friends among 
friends of friends (local rewiring) and having casual 
encounters which may lead to long-term relation- 
ships (random rewiring). 

4. Global rewiring, k is the user who maximizes Sik 
among all users U (this is a local rewiring with the 
neighborhood being the whole network). 



A. Topology adaptation 

The search for new and better information sources is 
a fundamental feature of many social communities. In 
the model described above, the leader updating proce- 
dure is intended to drive the network to an optimal state 
where users with high similarity are directly connected, 
so that the system is able to efficiently deliver right news 
to right users. Among the rewiring strategies described 
above, global search mechanism like the Global rewiring 
are very demanding for large-scale networks and also un- 
feasible without a centralized control, whereas, the Ran- 
dom rewiring is very inefficient as good new leaders are 
hardly found by chance. One is therefore constrained to 
use local search rules. The basis of the Local rewiring is 
to define the "neighborhood" of a user, i.e. a set of close 
users in the network who stand for possible candidate 
leaders. The choice of a specific neighborhood should be 
clever enough to allow users to actually find their taste 
mates. For instance, the pool of candidate leaders should 
not be too large, as in this case the search becomes un- 
manageable for the system. On the other hand, if the 




FIG. 1. Local network structure of one user. Links' directions 
reflect how information flows between users. 



neighborhood size is very small (compared to the whole 
network) the rewiring may stop at a sub-optimal assign- 
ment of leaders: the topology evolution halts if users' bet- 
ter leaders are at some moment out of the neighborhoods 
(they can never be reached) , meaning that the algorithm 
got trapped in a sub-optimal state [T8]. A possible so- 
lution to this problem is to employ some percentage of 
randomness in the selection, as in the Hybrid rewiring. 
In this way users may happen to get connected regard- 
less of their distance, and the pool of candidate leaders 
for each user is potentially the whole network. In the 
following analysis we will always make use of a Hybrid 
rewiring with 10% of randomness, to exploit mainly the 
local search but to avoid trapping in a local minimum 
(see [TS] for a detailed study of the effect of the random- 
ness percentage on the rewiring efficiency). 



B. Neighborhood definition 

We shall now define the "neighborhood", i.e. the set 
of candidate leaders exploited by the Hybrid rewiring. 
The local network structure from a user's viewpoint is 
represented in Figure [Tl At distance one from the user 
there are two sets of users: her leaders (L) and follow- 
ers (F). L and F form the first layer from the user. 
At distance two, we find four different sets of users: her 
leaders' leaders (LL), leaders' followers {LF), followers' 
leaders {FL) a,Tid followers' followers {FF). These sets 
form the second layer from the user. Notice that the de- 
scribed sets may overlap with each other (e.g. a user can 
be leading but also following another user). Given such 
scheme of the local network structure, we have to con- 
sider which of these sets contain potential good leaders 
for the user. 

Apart from the current set of leaders L, the first layer 
contains a good set of candidates — represented by F . In- 
deed if user i is a good leader of user j, meaning that 
i obtains valuable information from i, then i and j are 
likely to have some common interests and the similarity 
between them can be high. Hence also user j can provide 
user i with the right content and be a good information 
source for her. This assumption is supported by the high 
value of the link reciprocity in many information-sharing 
social networks (see Table ll| . Including F in the candi- 
date set hence increases the probability of having recip- 



rocal links. However, this set may be too small to be 
considered alone. Therefore we move further to the sec- 
ond layer. The leaders' leaders set [LL) was considered in 
jTF where the authors observed that since user j obtains 
valuable information from her leader i and such informa- 
tion may come from i's leaders, then j can have similar 
interests with i's leaders and benefit from following them. 
Again this assumption is supported by the high value of 
the clustering coefficient in many social networks (Table 
IT]). Analogous considerations lead us to take into account 
also the LF, FL and FF sets. 

In the following sections we will study the behavior of 
the described model when different rewiring methods are 
employed. When using Hybrid rewiring, we will denote it 
by the neighborhood that it exploits. For instance it will 
be named as LL if only leaders' leaders are considered as 
candidates, and LL + F if also followers are included. 



IV. RESULTS 

For numerical tests of the model, we use an agent- 
based framework within an artificial network. Tastes of 
user i are represented by a D-dimensional binary vector 
ti and attributes of news a by a D-dimensional binary 
vector ac,. A taste vector is assigned to each user at the 
beginning of the simulation, whereas, news have their at- 
tributes assigned when they enter the system. Each vec- 
tor has a fixed number. Da, of elements equal one (active 
tastes) and all remaining elements equal zero. We always 
set the system so that all mutually different user taste 
vectors are present exactly once: C^ = (^j ) [IE]- This 
also means that the taste vectors of two users differ at 
least in two elements. Hence we define as "taste-mates" 
users with exactly two different taste vector elements. 
Opinion of user i about news a is based on the overlap of 
the user's taste vector with the news's attribute vector 



^ ^ia [y'i : ^a) 



(5) 



where (•, •) is a scalar product of two vectors. If Qi^ ^ ^ 
user i likes news a {cia = +1), otherwise she dislikes it 
{eia = — !)■ Here A is the users' approval threshold. 

Simulation runs in discrete time steps. In each step, 
an individual user is active with probability pa- When 
active, the user reads and evaluates the top R news from 
her recommendation list and with probability ps submits 
a new news with attributes identical to the user's tastes. 

To build the artificial leader-follower network for the 
spreading of news, we always start from an initial network 
configuration with random assignment of leaders to users. 
Then we use the chosen rewiring strategy to update the 
connections after every u time steps. 

Parameters values used in all following simulations are 
reported in Table [Til 

To measure the system's performance we use: 

• Approval fraction^ the ratio of news approvals to 
all assessments: it tells us how often users are sat- 
isfied with the news they read and is defined as 



0.9 



TABLE II. List of parameters used in simulations. 



symbol 



value 



Number of users 
Number of leaders per user 
Dimension of taste vectors 
Number of active tastes per vector 
Users' approval threshold 
Probability of being active 
Probability of submitting a news 
Number of news read when active 
Period of the rewiring 
Scale of the time decay 
Base similarity for users 



u 


3003 




L 


10 


c 


D 


14 


,o 


Da 


6 


& 


A 


3 


"a 


PA 


0.05 


a 


PS 


0.02 


a. 


R 


3 


c^ 


u 


10 




T 


10 




So 


10"=* 





T,^c^ei^,l/T,^a^\e^c,\■^ (^i^re again S is the Kro- 
necker symbol). In general this quantity can be 
affected by many factors other than system's per- 
formance; however it is a significant measure to 
consider as the main goal of any recommender sys- 
tem is to have users satisfied with what they are 
recommended — this is a necessary condition for the 
service to work. 

• Average differences, the average number of vector 
elements in which users differ from their leaders: 
it is defined as EjEzsl, 11** -^i\\i/UL (here ||.||i 
is the 1-norm) and measures how well the network 
has adapted to users' tastes. This is an impor- 
tant quantity to consider as our recommendation 
method is based on an adaptive network evolution 
with the aim of having taste-mates directly con- 
nected. Achieving a low value of average differences 
is thus a sign of good network adaptation. 

Figure [2] shows the approval fraction (a) and the av- 
erage differences (b) at different times steps of the net- 
work's evolution and for different definitions of the neigh- 
borhood exploited by the Hybrid rewiring. Global and 
Random methods are shown as benchmarks. We observe 
that any strategy gradually improves both approval frac- 
tion and average differences. As expected, if we limit 
the pool of candidate leaders to F, users are not much 
satisfied because they can hardly find good information 
sources. This is the result of having considered a very 
small set (the average number of followers for a user 
equals L). If instead we define LL as the neighborhood, 
as in [18], we significantly improve both users' satisfac- 
tion degree and network's adaptation speed. This is be- 
cause the candidate set is much wider in this case — there 
are on average L[L — r — {L — l){c + Lq/2)] different lead- 
ers' leaders for a users, and this number is much greater 
than L for typical values of r, c and q (here q is the 
probability that four users are linked in a closed square 
structure). To further improve the performance of the 
system, we expand the candidate pool to LL + F. With 
this definition of the neighborhood we promote at the 
same time the reciprocity and the clustering coefficient 
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1000 2000 3000 4000 5000 6000 
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FIG. 2. Evolution of approval fraction (a) and average dif- 
ferences (b) for different rewiring strategies (single realization 
of the system). Since taste-mates have exactly two different 
taste vector elements, the lowest possible value of the average 
differences is two, which corresponds to a globally optimal 
assignment of leaders in the network [29] . 



of the network, obtaining a surprising effect: both ap- 
proval fraction and average differences become as good 
as the ones obtained by the Global rewiring, i.e. by con- 
sidering the whole network as the candidate leader set. In 
other words, such a small local scale turns out to be as 
representative as a whole-network scale. Hence further 
expanding the candidate set to the whole second layer 
(i.e. LL + LF + FL + FF + F, or 2"'ilayer-|-i^ for short- 
ness) does not bring to any substantial improvements. 

We also measure the values of link reciprocity and clus- 
tering coefficient in the network. We firstly introduce 
two reference values for r and c. In the initial random 
network the average probability to find a reciprocal link 
between two connected vertices is simply equal to the av- 
erage probability of finding a link between any two ver- 
tices, which is given by {UL)/[U{U - 1)] = L/{U - 1). 
Hence we have vq = L/{U — I). This equality also 



TABLE III. Statistics of artificial networks at equilibrium for different rewiring strategies. The numbers in parenthesis are 
the errors on the last significant figures, a.f. and a.d. stand for approval fraction and average differences, respectively. The 
reference values for r and c in our setting are r* = 0.208 and c, — 0.053. 





Global 


2"'^layer+f 


LL + F 


LL 


F 


Random 


a.f. 


0.870(4) 


0.870(4) 


0.866(2) 


0.735(6) 


0.670(5) 


0.739(1) 


a.d. 


2.35(2) 


2.35(3) 


2.43(1) 


2.54(6) 


3.85(4) 


2.86(2) 


r 


0.63(2) 


0.65(2) 


0.61(1) 


0.13(1) 


0.85(1) 


0.12(0) 


c 


0.24(2) 


0.27(2) 


0.25(1) 


0.17(1) 


0.01(0) 


0.03(0) 


1-Q 


0.96(2) 


0.96(1) 


0.97(1) 


0.96(1) 


0.81(0) 


0.82(0) 


l-fi 


0.10(4) 


0.10(3) 


0.07(2) 


0.04(3) 


0.28(0) 


0.32(0) 



holds for the probabiHty to find a closed link triangle 
between three users, hence cq = L/{U — 1). Instead 
if the network is in a structure-less optimal configu- 
ration where each user has randomly chosen L of her 
N = Da{D — Da) taste-mates as leaders, then the value 
of the reciprocity becomes r* = L/N . Besides in this net- 
work state the probability that two taste-mate neighbors 
of a user are also taste-mates with each other is given 
by [D — 2)/(Af — 1). To show this, consider two taste 
mate users: there are [Da — \) + [D — Da — 1) = D — 2 
other users who are taste mates with both of them and 
[Da - 1)(-D ~ Da-'^) ^ N -D + l who are taste mates 
with only the first of them. The clustering coefficient is 
given by the above-mentioned probability conditional to 
the existence of a link: c, = [{D - 2)/{N - 1)][L/N]. 

The stationary values of r and c arc reported in tabic 
III For any rewiring method, the reciprocity coefficient 
increases with respect to its initial value rg. As expected, 
the final value of r is very high with F (by construction, 
F promotes reciprocity) and low for Random and LL. 
In the latter cases, r becomes close to r^,. The other 
methods achieve similar values of r, which are compara- 
ble with the ones of real social networks. The clustering 
coefficient shows an opposite trend: starting from cq, it 
becomes very large with LL (by construction), whereas, 
it remains quite small for Random and F — converging to 
values close to c*. For the other methods c's stationary 
values are again comparable with the ones of real social 
networks. 

The fact that the values of r and c in our artificial net- 
work are high is not surprising, as the local search strate- 
gies are built with the aim of enhancing these quantities. 
Since our main purpose is not to model the evolution 
of a real social network, we only provide a quantitative 
comparison for reciprocity and clustering values in real 
and artificial systems: In the adaptive model each user 
has L = 10 connections, which is of the same order of 
magnitude of the average degree of a user in the studied 
real social communities (Table m , hence the link densi- 
ties in these systems are comparable. The values of r 
and c in our artificial system (which are not set by hand 
but result from the topology adaptation) and in real net- 
works also are of the same magnitude. We can conclude 
that these networks, despite having different size, exhibit 
similar local link patterns. 



At last, we discuss the efficiency of the modeled rec- 
ommender system. When making recommendations, it 
is possible to fall into two different kinds of error: rec- 
ommending content that users wouldn't like, and not 
recommending content that users would like. These er- 
rors are known respectively as of type I {false positives) 
and of type II {false negatives) [5D]. To complete the 
picture, true positives are recommendations of content 
that users would like, and true negatives are lacks of rec- 
ommendation of content that user wouldn't like. Note 
that false positives upset users but false negatives do not 
(i.e. a type I error has more serious consequences than 
the other), hence a good recommendation engine should 
mainly reduce false positives. We further introduce the 
specificity (1 — a) and the sensitivity (1 — /3) of the rec- 
ommendation system as the ability to avoid respectively 
false positives and false negatives iS.l. : 



1 



TN 



TN+FP 



1-/3 = 



TP 



TP+FN 



where TP, TN, FP and FN are respectively the number 
of true positives, true negatives, false positives and false 
negatives. To measure these quantities in our artificial 
setting, we define a as the average number of wrong rec- 
ommendations for a news over the number of users who 
might dislike this news— given by J2k=o ('"/)( D^-fc)' 
and 1 — /3 as the average number of good recommenda- 
tions for a news over the number of users who might like 
this news-given by E^=a C"/) (r"""^^- 
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Table III also reports the stationary values of speci- 



ficity and sensitivity for the recommender system when 
different source selection strategies are employed. Speci- 
ficity is remarkably high for all methods, especially for 
the best performing ones, hence the number of false posi- 
tives in the system is very low. Sensitivity shows instead 
an opposite trend: Random and F updating strategies 
are the best performing now. We see that the effort 
of reducing one type of error results in increasing the 
other type, as it generally happens in statistical tests. In 
our case the reason behind this phenomenon is the pres- 
ence of tightly connected components in the system: in a 
highly clustered network news have few paths to spread 
far from the users who post them (and the spreading 
process takes long time), hence they tend to remain lo- 
calized. As a consequence, few users receive a news but 



almost all of them like it. When clustering is low, a news 
has more spreading directions, hence it can reach many 
users but more of them eventually dislike it. However 
we are mainly interested in having a recommender sys- 
tem with high specificity, and in this respect simple local 
strategies like LL + F again perform at the same level 
of global search in generating optimal network structures 
for recommending and sharing information. 



V. CONCLUSION 

How to recommend the right content to the right 
person and which are this person's favorite information 
sources are fundamental questions in the age of infor- 
mation overload. In this work we exploited a recently 
proposed news recommendation model which combines 
similarity of users' past activities and social relation- 
ships to obtain recommendations, and which mimics the 
spreading process typical for social systems where the 
network of connections continually evolves with time |17j . 
The topology evolution is intended to provide users with 
newer and better information sources. Since global op- 
timization of the users' connections is computationally 
prohibitive for a large system, a key issue of the model is 
where to find good new leaders for users. Taking real life 
as inspiration, we designed different local leader search 
strategies which mimic the users' search of sources in real 
social communities. We then studied the resulting evo- 
lution and properties of our artificial system and showed 
that with these local search rules the users' community 
can self-organize into optimal topologies which are equiv- 
alent to the ones that can be generated by global knowl- 
edge of the system. Indeed the resulting artificial net- 
works have high values of reciprocity and clustering, as 
similar to the real information-sharing social communi- 
ties studied in section |lll Therefore our automated ab- 
stract rules help to create networks which not only are 
effective for the spreading of information but also resem- 
ble structures resulting from real human activity. 

We would like to remark that our main goal is not to 
model the evolution of a real network. The recipes we 
proposed for the optimization of local network connec- 
tions are instead a valuable tool which may find appli- 
cation in many systems other than the recommendation 
model presented in this paper — among which social and 
p2p networks are just a few examples. The observed fea- 
tures of the studied social networks suggest that these 
local rules for topology adaptation also stand for possi- 
ble mechanisms underlying the microscopic evolution of 
real social communities [32] • 

Finally we recall that the adaptive recommendation 
model presented in this work is new as it combines 
similarity-based and social recommendation with the 
spreading of news. An agent-based framework as the one 
we used in this work represents an ideal playground for 
testing and of the model and can significantly contribute 
to our understanding of the system [33] . In past works 



it was used to assess the filtering efficiency of the system 
[l8] and to study the formation of scale-free leadership 
structures [20] ■ In this work it was employed to test and 
validate the proposed strategies for network adaptation, 
a task which would have been hard to achieve in a field 
study with real users. Agent-based modeling is also use- 
ful for comparing our method with other recommenda- 
tion techniques (Appendix [A] and |17j). In addition to 
our efforts to have robust simulation results (Appendix 
[b| , it still would be beneficial to have direct empirical 
input for user behavior. We are working on an on-line 
implementation of the system and we expect to have in 
the close future relevant data about real user actions, 
which will be valuable material for future research. 

The basic feature of our model is recommendations 
coming from sources selected by the users themselves. As 
there is increasing evidence that users are more inclined 
to buy products [34j or like contents J35] recommended 
by friends or trusted peers than by others or by abstract 
mathematical algorithms, our new adaptive recommen- 
dation method is a promising candidate to be employed 
in various social and commercial applications. 
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Appendix A: Comparison with other 
recommender systems 

In the adaptive model proposed here, the recommen- 
dation process is based on the computation of users' sim- 
ilarity scores from their past assessments. In this re- 
spect, the model is similar to a widely- adopted recom- 
mendation technique: memory-based collaborative filter- 
ing (CF) [2 |3|. The idea behind CF is to make pre- 
dictions about the interests of a user by collecting past 
preferences from the community the user belongs to, with 
the underlying assumption that those users who agreed 
in the past tend to agree again in the future. In par- 
ticular, in memory-based CF the recommendation score 
for an object is computed as a weighted average of rat- 
ings given by other users with weights proportional to 



the similarity between users 



rpCF _ 



2^1=1 
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l^j = l ■ 
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where Cja is user j's opinion about object a. An alterna- 
tive approach is to consider for each user only the top-L 
more similar users and solely use their assessments to 
compute the recommendation scores [55] : 
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Recently, diffusion-based methods were employed to effi- 
ciently extract the similarity between objects or users [HI 
[TU] , Among these methods, which can be consid- 
ered as extensions of CF, the probabilistic spreading 
(ProbS) algorithm was shown to outperform other stan- 
dard memory-based CF techniques [HI [H]. ProbS is 
based on the bipartite network of U users and O ob- 
jects, where a link between user i and object a exists if 
a was collected by i (meaning that the relative entry of 
the bipartite adjacency matrix bia equals one, and zero 
otherwise). For each user i, ProbS assigns objects an ini- 
tial level of resource fl = bip and then redistributes it to 
obtain recommendations for uncollected objects via: 



Ri. 



u u 
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We will use these three approaches for testing of our 
adaptive model — see |17j for a comparison with simpler 
recommendation methods. 

While our adaptive model and these recommender sys- 
tems are intrinsically different, with the time dimension 
embedded in the first but totally missing in the others, we 
can still compare the performance of the various meth- 
ods in the agent-based framework as follows. We let the 
adaptive system run until it sufficiently approaches equi- 
librium, then when we freeze the evolution and store the 
current set of users' recommendation lists produced by 
the adaptive model. Since in the artificial setting we have 
the luxury of knowing the opinion of each user about 
each news, we can check how many of the news in a 
user's list would be liked by the user (we take into ac- 
count only the top S places, as real users usually consider 
only the top recommendations). Averaging these values 
over all users we obtain the mean precision p of the rec- 
ommendation process j^. A still better perspective is 
given by considering p relative to the precision of ran- 
dom recommendations Pr (which can be easily evaluated 
in our agent-based setting). This gives the precision en- 
hancement p/pr- To assess the performance of the other 
methods, we use all users' assessments resulting from the 
system's evolution to compute the similarities between 
users [37j and the consequent recommendation scores by 
CF and top-L CF, and to build the user-news bipartite 
network |38| for computing the recommendation scores 
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FIG. 3. Precision enhancement of the recommender system 
(left) and average age of the recommended news (right) by 
the adaptive model (optimized by the LL + F rewiring with 
T = 10) and by CF, top-L CF {L = 10) and ProbS algorithm. 
We have used S = 20. Upper and lower plots refer to different 
system settings in which users are less and more demanding 
(A equal to 3 and 4, respectively). 



by ProbS. From these scores we obtain for each user a 
recommendation lists, and then we proceed as described 
before to obtain the precision enhancement values. 

However there is a fundamental difference between the 
recommendations by the adaptive model and the ones by 
the other algorithms. While in the first the time evo- 
lution and the damping mechanism allow only recom- 
mendations of fresh news, in the others there is no such 
restriction. Hence CF and ProbS often recommend old 
news which are in principle liked by users but which are 
not of topical interest. This represents a major draw- 
back, especially for a news recommendation engine. To 
overcome this flaw and to have a fair comparison, we 
can modify recommendation scores by applying the same 
damping mechanism as in the adaptive model Q: 



Ria <— Ria {^ — T ) 



(A4) 



Figure [3] shows the precision values and the average 
age of recommended news (t — ta) by the adaptive model 
and the other recommender systems. We observe that the 
adaptive model substantially outperforms CF and ProbS 
unless one is allowed to recommend also very old news 
(t ^ 10). This is because these methods make use of 
all available information and are able to compute the 
recommendation score for almost every user-news pair. 
Then if the time decay is weak these methods tend to 
recommend old and already popular content, as typical 
of standard recommendation techniques which are usu- 
ally ineffective when dealing with recently introduced ob- 
ject, about which there are few users' feedbacks. On the 
other hand if the decay is strong the freshness of the news 



becomes dominant over their recommendation scores, to 
the detriment of precision. Our model can instead pro- 
duce very precise recommendations for fresh news. This 
is because the memory-based component (i.e. the simi- 
larity estimation) serves to select a few best information 
sources for each user, who act as her information filters. 
Then, given the resulting social network, users only get 
very precise recommendations, among which the most 
fresh emerge thanks to the damping mechanism. The 
importance of considering only recommendations com- 
ing from taste-mates is also reflected by the performance 
of top-L CF (which is very close to the one of the adap- 
tive model for similar values of r — the top-L approach 
is in fact a global rewiring at each step followed by the 
filling of users' recommendation lists). The advantage of 
our model with respect to this last approach is that the 
local search rules for leaders' selection and the recommen- 
dations automatically resulting from the news spreading 
process overcome the limitations of top-L CF related to 
scalability and real-time performance [3S]. In addition 
our model is naturally open to manual selection of lead- 
ers by users themselves, exploiting in this way social rec- 
ommendation with all its benefits [TH [T3] . 



Appendix B: Assumptions 
of agent-based modeling 

The agent-based model introduced in the previous sec- 
tion was very helpful for understanding how our adaptive 
system behaves. At the same time the complexity of its 
assumptions is such that it is not clear if the reported 
behavior is general or if it is parameter-dependent. Here 
we discuss the robustness of our results with respect to 
individual assumptions. 

The first important point to clarify is that the observed 
features of the system do not depend on the number of 
users U. We run simulations of a 15-tiines bigger network 
with U = (g**) = 43758 and A = 4 and observe that 
the simple LL + F has again the same performance as 
the global search. Another important parameter is the 
number L of leaders per user, which regulates how many 
news a user receives and in general how many spreading 
directions there are in the system. Figure [4] shows how 
the system behaves for different values of L. When L 
is small news propagate slowly and hardly reach their 
intended audience. This is why the system is not able to 
adapt well (users' rating histories do not overlap enough), 
and despite the fact that the rate of false positives is very 
low users end up reading also news that do not fit their 
interests. On the other hand if L is large news get a large 
audience but the system loses its filtering efficiency: there 
are many false positives that cause users to be unsatisfied. 
For a given system size, one can hence set L to a value 
which gives the best compromise between 1 — a and 1 — /?, 
which also results in a maximum of the approval fraction 



(L ~ 10 for local search methods, in our case). 

Moving further, R, pA and ps regulate the amount of 
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FIG. 4. Stationary values of approval fraction, average differ- 
ences, reciprocity, clustering, specificity and sensitivity of the 
system for various numbers of leaders per user (L). 



news which spread over the network at a given time and 
the rate of news consumption by users. We set them in 
order to always have enough content in the system, so 
that users have access every time to diverse news and 
actually read news forwarded by others. However we ob- 
serve that the system works reasonably well unless one 
of two possible situations arises. If the news circulating 
in the system are too few, users who are hungry for in- 
formation will end up reading content that they won't 
like: no recommender system can work if there is too lit- 
tle choice. On the other hand, if the available news are 
too many then each user will read different news: there 
is too little information and thus the similarity values 
cannot be always evaluated properly — if such a situation 
occurs in reality, one might employ additional filtering 
to prevent these negligible effects. However also in this 
case neighboring users are likely to have some overlap of 
reading histories, hence local search methods can still be 
successfully employed. 

r is the scale for the decaying of recommendation 
scores with time, and has to be set such that news can 
survive for enough steps to spread widely, allowing sim- 
ilarity values to be assessed reliably. Finally, A, D and 
Da determine how much users are demanding and how 
wide are their scopes of interest (see [17] for a discussion 
on how the system's performance relates to these quanti- 
ties), whereas, u is the frequency for link rewiring, which 
is set to an optimal value for having both effective rec- 
ommendations and low computational complexity [19j . 
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